Skip to main content

第 12 章:長期維運與備份更新

etcd backup and restore operations

Backing up etcd

etcd 資料的備份可以透過 etcdctl 命令列建立快照 snashot 進行,備份產生的資料應該盡快複製到叢集外一個安全的地方保存

對於 kubeadm 搭建的集群, etcd 是運行在一個 pod 裡, 資料儲存在 /var/lib/etcd,這個目錄透過 hostPath mount 到 master 節點上

  • 安裝 jq 命令列 Json 處理工具
sudo apt-get install jq
  • 查看 etcd 的設定 jsonpath
kubectl get  pod --namespace kube-system $ETCDPOD -o jsonpath='{.spec.containers[0].volumeMounts}' | jq
[{    "mountPath": "/var/lib/etcd",
"name": "etcd-data"
},
{ "mountPath": "/etc/kubernetes/pki/etcd",
"name": "etcd-certs"
}]
sudo tree /var/lib/etcd/
/var/lib/etcd/
└── member
├── snap
│   ├── 0000000000000016-00000000001bc619.snap
│   ├── 0000000000000016-00000000001bed2a.snap
│   ├── 0000000000000016-00000000001c143b.snap
│   ├── 0000000000000016-00000000001c3b4c.snap
│   ├── 0000000000000016-00000000001c625d.snap
│   └── db
└── wal
├── 0.tmp
├── 000000000000000f-000000000015c31b.wal
├── 0000000000000010-0000000000173561.wal
├── 0000000000000011-000000000018a661.wal
├── 0000000000000012-00000000001a1b93.wal
└── 0000000000000013-00000000001b8e6c.wal
$ 3 directories, 12 files
apt install etcd-client
sudo ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/lib/dat-backup.db
2024-01-10 09:50:08.219021 I | clientv3: opened snapshot stream; downloading
2024-01-10 09:50:08.260202 I | clientv3: completed snapshot read; closing
Snapshot saved at /var/lib/dat-backup.db
sudo ETCDCTL_API=3 etcdctl --write-out=table \
snapshot status /var/lib/dat-backup.db
sudo ETCDCTL_API=3 etcdctl --write-out=table \
snapshot status /var/lib/dat-backup.db
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| ddcc0eaf | 10994 | 802 | 2.4 MB |
+----------+----------+------------+------------+

透過 etctl 恢復 etcd

$ sudo ETCDCTL_API=3 etcdctl snapshot restore /var/lib/dat-backup.db

# 備份一下恢復之前的數據,以防止恢復失敗
$ mv /var/lib/etcd /var/lib/etcd.OLD

# 複製恢復數據
$ sudo mv ./default.etcd /var/lib/etcd

# 停止 etcd 容器
# 找到容器 ID
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps
# stop
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock stop 12aa2cc38d214<container id>
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
12aa2cc38d214 ead0a4a53df89 2 hours ago Running coredns 0 55c19e3fe129e coredns-5dd5756b68-m5h7c
afd918d9a67ad ead0a4a53df89 2 hours ago Running coredns 0 3a75a1b46a383 coredns-5dd5756b68-rwbt8
80f09cada5a3a 0dc86fe0f22e6 2 hours ago Running kube-flannel 0 dd900068e0780 kube-flannel-ds-n97mk
95cb9f516b28e 01cf8d1d322dd 2 hours ago Running kube-proxy 0 fba8030e3c20e kube-proxy-nbfck
e0eda3430381a c527ad14e0cd5 2 hours ago Running kube-controller-manager 0 8617653d93c9b kube-controller-manager-k8s-msr-1
352946671d00a 9ecc4287300e3 2 hours ago Running kube-apiserver 0 22c25b3226e74 kube-apiserver-k8s-msr-1
bca22a7dd2a5b 73deb9a3f7025 2 hours ago Running etcd 0 027d1c481a835 etcd-k8s-msr-1
13b45c643ca97 babc03668f18a 2 hours ago Running kube-scheduler 0 6706492009923 kube-scheduler-k8s-msr-1

upgrade kubeadm-based Cluster

檢查目前版本:

kubectl version

Client Version: v1.28.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5

kubectl get nodes

NAME        STATUS   ROLES           AGE    VERSION
k8s-msr-1 Ready control-plane 119m v1.28.0
k8s-wrk-1 Ready <none> 118m v1.28.0
k8s-wrk-2 Ready <none> 118m v1.28.0
  • kubeadm version
kubeadm version: &[version.Info](http://version.info/){Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:20:15Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

更新 Control Plane

# 以 Ubuntu 為例
# update kubeadm
sudo apt-mark unhold kubeadm
sudo apt-get update
sudo apt-cache policy kubeadm
sudo apt-get install -y kubeadm=$TARGET_VERSION
sudo apt-mark hold kubeadm

# drain master node
kubectl drain k8s-master --ignore-daemonsets

sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v$TARGET_VERSION

# uncordon
kubectl uncordon k8s-master

# update kubelet and kubectl
sudo apt-mark unhold kubelt kubectl
sudo apt-get update
sudo apt-get install -y kubelet=$TARGET_VERSION kubectl=$TARGET_VERSION
sudo apt-mark hold kubelet kubectl

更新 work node

# 以 Ubuntu 為例
# go to master node
kubectl drain k8s-worker1 --ingore-daemonsets

# update kubeadm
sudo apt-mark unhold kubeadm
sudo apt-get update
sudo apt-get install -y kubeadm=$TARGET_VERSION
sudo apt-mark hold kubeadm

sudo kubeadm upgrade node

# update kubelet and kubectl
sudo apt-mark unhold kubelt
sudo apt-get update
sudo apt-get install -y kubelet=$TARGET_VERSION
sudo apt-mark hold kubelet

# go to master node, uncordon this node
kubectl uncordon k8s-worker1

Node 管理

將 node 脫離調度範圍

  1. 在 yaml 文件,設定 spec
spec:
unschedulable: true
  1. 使用 kubectl patch 命令
kubectl patch node k8s-node-1 -p {"spec":{"unscheduable":true}}
  1. 使用 cordon 與 uncordon
kubectl cordon k8s-node-1
kubectl uncordon k8s-node-1

定義 context

kubectl config set-cluster kubernetes-cluster --server=https://192.168.1.128:8080
kubeclt config set-context ctx-deve --namespace=development --cluster=kubernetes cluster --user=dev

Pod 的健康檢查

  • LivenessProbe
    • 如果探針偵測到為不健康,則 kubelet 會刪除該容器
  • ReadinessProbe
    • 用於判斷容器是否啟動完成
定義存活命令
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: registry.k8s.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Details

定義 HTTP 接口 apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers:

  • name: liveness image: registry.k8s.io/liveness args:
    • /server livenessProbe: httpGet: path: /healthz port: 8080 httpHeaders:
      • name: Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3
Details

定義 TCP 存活探測 apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers:

  • name: goproxy image: registry.k8s.io/goproxy:0.1 ports:
    • containerPort: 8080 readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20